CUR matrix decompositions for improved data analysis.

نویسندگان

  • Michael W Mahoney
  • Petros Drineas
چکیده

Principal components analysis and, more generally, the Singular Value Decomposition are fundamental data analysis tools that express a data matrix in terms of a sequence of orthogonal or uncorrelated vectors of decreasing importance. Unfortunately, being linear combinations of up to all the data points, these vectors are notoriously difficult to interpret in terms of the data and processes generating the data. In this article, we develop CUR matrix decompositions for improved data analysis. CUR decompositions are low-rank matrix decompositions that are explicitly expressed in terms of a small number of actual columns and/or actual rows of the data matrix. Because they are constructed from actual data elements, CUR decompositions are interpretable by practitioners of the field from which the data are drawn (to the extent that the original data are). We present an algorithm that preferentially chooses columns and rows that exhibit high "statistical leverage" and, thus, in a very precise statistical sense, exert a disproportionately large "influence" on the best low-rank fit of the data matrix. By selecting columns and rows in this manner, we obtain improved relative-error and constant-factor approximation guarantees in worst-case analysis, as opposed to the much coarser additive-error guarantees of prior work. In addition, since the construction involves computing quantities with a natural and widely studied statistical interpretation, we can leverage ideas from diagnostic regression analysis to employ these matrix decompositions for exploratory data analysis.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deterministic CUR for Improved Large-Scale Data Analysis: An Empirical Study

Low-rank approximations which are computed from selected rows and columns of a given data matrix have attracted considerable attention lately. They have been proposed as an alternative to the SVD because they naturally lead to interpretable decompositions which was shown to be successful in application such as fraud detection, fMRI segmentation, and collaborative filtering. The CUR decompositio...

متن کامل

Identifying important ions and positions in mass spectrometry imaging data using CUR matrix decompositions.

Mass spectrometry imaging enables label-free, high-resolution spatial mapping of the chemical composition of complex, biological samples. Typical experiments require selecting ions and/or positions from the images: ions for fragmentation studies to identify keystone compounds and positions for follow up validation measurements using microdissection or other orthogonal techniques. Unfortunately,...

متن کامل

Efficient algorithms for cur and interpolative matrix decompositions

The manuscript describes efficient algorithms for the computation of the CUR and ID decompositions. The methods used are based on simple modifications to the classical truncated pivoted QR decomposition, which means that highly optimized library codes can be utilized for implementation. For certain applications, further acceleration can be attained by incorporating techniques based on randomize...

متن کامل

RSVDPACK: An implementation of randomized algorithms for computing the singular value, interpolative, and CUR decompositions of matrices on multi-core and GPU architectures

RSVDPACK is a library of functions for computing low rank approximations of matrices. The library includes functions for computing standard (partial) factorizations such as the Singular Value Decomposition (SVD), and also so called “structure preserving” factorizations such as the Interpolative Decomposition (ID) and the CUR decomposition. The ID and CUR factorizations pick subsets of the rows/...

متن کامل

Improving CUR Matrix Decomposition and Nyström Approximation via Adaptive Sampling

The CUR matrix decomposition and Nyström method are two important low-rank matrix approximation techniques. The Nyström method approximates a positive semidefinite matrix in terms of a small number of its columns, while CUR approximates an arbitrary data matrix by a small number of its columns and rows. Thus, the CUR decomposition can be regarded as an extension of the Nyström method. In this p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Proceedings of the National Academy of Sciences of the United States of America

دوره 106 3  شماره 

صفحات  -

تاریخ انتشار 2009